12. Slope After Cleaning
Slope After Cleaning
Question:
In
outliers/outlier_cleaner.py
, you will find the skeleton for a function called
outlierCleaner()
that you will fill in with a cleaning algorithm. It takes three arguments:
predictions
is a list of predicted targets that come from your regression,
ages
is the list of ages in the training set, and
net_worths
is the actual value of the net worths in the training set. There should be 90 elements in each of these lists (because the training set has 90 points in it). Your job is to return a list called cleaned_data that has only 81 elements in it, which are the 81 training points where the predictions and the actual values (net_worths) have the smallest errors (90 * 0.9 = 81). The format of cleaned_data should be a list of tuples, where each tuple has the form (age, net_worth, error).
Once this cleaning function is working, you should see the regression result changes.
What is the new slope?
Is it closer to the “correct” result of 6.25?
Start Quiz:

INSTRUCTOR NOTE:
NOTE: In
outliers/outlier_removal_regression.py
, in the section where outlier cleaning is performed (starts with the comment
### identify and remove the most outlier-y points
), make sure that the input argument to
reg.predict
is
ages_train
and not
ages
so that you are cleaning based on just the training data. The arguments to the cleaner should also be based off of the
*_train
variables.